Skip to content

fix: lifecycle & teardown correctness (PR1 of bug-audit-v2)#108

Merged
lesnik512 merged 14 commits into
mainfrom
fix/bug-audit-v2-pr1-lifecycle
Jun 5, 2026
Merged

fix: lifecycle & teardown correctness (PR1 of bug-audit-v2)#108
lesnik512 merged 14 commits into
mainfrom
fix/bug-audit-v2-pr1-lifecycle

Conversation

@lesnik512

Copy link
Copy Markdown
Member

Summary

PR1 of the 2026-06-05 bug-audit-v2 plan. Closes ten lifecycle / teardown findings — every fix is "bootstrap touched state, teardown didn't clean it up" or "constructing two bootstrappers around the same app caused stacking." Sequenced per planning/specs/2026-06-05-bug-audit-v2-sequencing.md; per-task TDD plan at planning/plans/2026-06-05-pr1-lifecycle.md; parent audit at planning/specs/2026-06-05-bug-audit-v2.md.

Findings closed

  • LOG-1 — Document OTel set_tracer_provider set-once constraint. The SDK enforces it via _TRACER_PROVIDER_SET_ONCE.do_once(...); the "global reset on teardown" fix in the audit isn't achievable without touching SDK internals, so this becomes a class docstring that names the constraint and the supported lifecycle.
  • LOG-2 — Restore the two OTel-namespace stdlib loggers (opentelemetry.instrumentation.instrumentor, opentelemetry.trace) to their pre-bootstrap disabled state on teardown. Capture-and-restore via a new _prior_logger_disabled: dict[str, bool] field.
  • LOG-3LoggingInstrument.teardown() no longer dies on a handler.close() raise mid-loop. Errors are aggregated into TeardownError (matching BaseBootstrapper.teardown's pattern) so the rest of teardown — level reset, factory close — always runs. Factory-close errors are also caught and included in the aggregate so they don't silently replace handler errors via finally-replaces-pending semantics.
  • LOG-4FastStreamLoggingInstrument captures broker.config.logger.params_storage in bootstrap and restores it in teardown. Symmetric to LOG-2 on the broker side.
  • LOG-5 / SEC-4 — Replace assert precondition checks with explicit raises in _narrow_app (TypeError, per ruff's TRY004) and PyroscopeInstrument.bootstrap (RuntimeError, since the check is is None not isinstance). Closes both bandit B101 findings.
  • LOG-6 — Replace LitestarOpenTelemetryInstrumentationMiddleware._otel_apps from dict[int, ASGIApp] (id-keyed, never evicts) to weakref.WeakKeyDictionary[ASGIApp, ASGIApp] so wrapper apps GC with their next_app. Both .get() and __setitem__ are wrapped in contextlib.suppress(TypeError) so non-weakrefable callables fall through unharmed.
  • LOG-7FastMcpBootstrapper.__init__ now inspects application.providers for an existing _TeardownProvider and warns + skips re-attachment, preventing stacked teardown invocations when two bootstrappers are constructed around the same FastMCP app.
  • LOG-8FastAPIBootstrapper.__init__ uses application._lite_bootstrap_lifespan_attached (private attribute on the FastAPI instance) as a sentinel to detect and warn on duplicate lifespan-wrap attempts. Uses the library-private-attribute convention rather than squatting in Starlette's user-facing application.state namespace.
  • LOG-9SentryInstrument.teardown() calls sentry_sdk.flush(timeout=2) then sentry_sdk.init() (no args) to disable further event capture. The two existing tests that called sentry_sdk.init() in finally as a workaround now use instrument.teardown() instead.

Implementation notes

  • Two reviewer-driven course corrections during execution:
    • LOG-3 first landed with raise close_errors[0] (lost info on multiple failures and silent factory-close-error masking); rewritten to use the project's existing TeardownError(errors) pattern.
    • LOG-6's first landing only wrapped __setitem__ in contextlib.suppress; WeakKeyDictionary.get() also raises TypeError on non-weakrefable keys, so both call sites were wrapped and the test was rewritten to use a real non-weakrefable class (__slots__ = (), no __weakref__ slot).
    • LOG-8 first stored the sentinel on application.state; moved to a _lite_bootstrap_lifespan_attached direct attribute matching opentelemetry-instrumentation-fastapi's approach.
  • Two non-blocking follow-ups flagged by the final reviewer for PR2/PR3 scope:
    • Lift TeardownError aggregation pattern from LoggingInstrument to other instruments' teardown methods (currently only LoggingInstrument and BaseBootstrapper use it).
    • Document the "one bootstrapper per application" + "one OpenTelemetryInstrument per process" lifecycle constraints in README.md.

Test plan

  • just test — 164 passed (10 new tests + 1 updated existing), 100% coverage.
  • just lint-ci — clean (ruff format, ruff check, eof-fixer, ty).
  • bandit -r lite_bootstrap — no issues (both prior B101 findings closed by LOG-5/SEC-4).
  • Each commit is scoped to one finding ID so review/revert can land at the task level.

🤖 Generated with Claude Code

lesnik512 and others added 14 commits June 5, 2026 14:28
Verify-then-extend pass over the 2026-05-31 audit: 26/29 prior findings FIXED,
3 PARTIAL. Adds 26 new findings under UX, logic, security, and tests, with
file:line evidence and fix shapes. pip-audit and bandit run; results folded in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Decomposes the 26 new audit findings into 3 themed PRs: lifecycle/teardown
correctness, config UX + security validation, and hygiene + CI gate.
Per-PR scope, locked decisions, and file lists included.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-task TDD steps covering all 10 sub-fixes (LOG-1..9, SEC-4) from the
2026-06-05 audit. LOG-1 corrected to docstring-only after verifying OTel's
set_tracer_provider is set-once-enforced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bandit B101 flagged the assert as stripped under `python -O`. Replace with
explicit raise so the invariant holds under all Python optimization levels.
Ruff's TRY004 enforces TypeError for isinstance-guarded raises.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…EC-4 part B)

Bandit B101 flagged the assert as stripped under `python -O`. Replace with
explicit raise. Resolves the second of the two bandit B101 findings in PR1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The OTel SDK enforces set_tracer_provider as set-once per process. teardown()
calls shutdown() (added in CRIT-2) but cannot reset the process-global pointer.
Document the supported lifecycle: one OpenTelemetryInstrument per process.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bootstrap() previously set opentelemetry.instrumentation.instrumentor and
opentelemetry.trace loggers to disabled=True unconditionally, with no symmetric
restoration. Capture the pre-bootstrap state and restore on teardown so unrelated
code in the same process retains its expected logger configuration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (LOG-3)

A raise from handler.close() mid-loop previously left remaining handlers
attached, skipped the root-level reset, and never called close_handlers on the
factory. Wrap in try/finally and aggregate close errors into TeardownError
(matching BaseBootstrapper.teardown's pattern) so the rest of teardown completes
before all collected errors are re-raised together. Factory close failures are
included in the aggregation so they don't silently replace handler errors via
Python's finally-replaces-pending semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rdown (LOG-4)

bootstrap() mutates broker.config.logger.params_storage to inject a structlog
logger; teardown() now captures and restores the original value. Symmetric with
LoggingInstrument's parent teardown, which still runs via super().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bootstrap() called sentry_sdk.init() but no teardown reset the global state.
Process-local tests had to call sentry_sdk.init() in finally blocks as a
workaround; those are now replaced by instrument.teardown(). flush() drains
in-flight events before the reset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous dict[int, ASGIApp] keyed by id() never evicted, holding wrapper
OpenTelemetryMiddleware instances alive after Litestar dropped the next_app
reference. Switch to weakref.WeakKeyDictionary so wrappers GC with their apps.
Skip caching gracefully when an app doesn't support weak references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…G-7)

Constructing two FastMcpBootstrappers around the same application previously
stacked two _TeardownProvider instances, doubling the teardown call on shutdown.
Detect an existing _TeardownProvider in application.providers and skip re-attach
with a warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Constructing two FastAPIBootstrappers around the same application previously
stacked two lifespan wrappers, calling teardown twice on shutdown. Detect the
sentinel on application.state and skip re-wrap with a warning. Idempotent
teardown (CRIT-3) means the prior behavior was harmless, but this removes the
log noise and confusing stack depth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code-quality review on PR1 Task 9 noted the warning text didn't make clear
that the second FastMcpBootstrapper has no teardown wired up. Add that line.
Tests still match on "_TeardownProvider" so no test change needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lesnik512 lesnik512 self-assigned this Jun 5, 2026
@codecov

codecov Bot commented Jun 5, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
unittests 100.00% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...te_bootstrap/bootstrappers/fastapi_bootstrapper.py 100.00% <100.00%> (ø)
...te_bootstrap/bootstrappers/fastmcp_bootstrapper.py 100.00% <100.00%> (ø)
...bootstrap/bootstrappers/faststream_bootstrapper.py 100.00% <100.00%> (ø)
...e_bootstrap/bootstrappers/litestar_bootstrapper.py 100.00% <100.00%> (ø)
lite_bootstrap/instruments/logging_instrument.py 100.00% <100.00%> (ø)
..._bootstrap/instruments/opentelemetry_instrument.py 100.00% <100.00%> (ø)
lite_bootstrap/instruments/pyroscope_instrument.py 100.00% <100.00%> (ø)
lite_bootstrap/instruments/sentry_instrument.py 100.00% <100.00%> (ø)
tests/instruments/test_logging_instrument.py 100.00% <100.00%> (ø)
tests/instruments/test_opentelemetry_instrument.py 100.00% <100.00%> (ø)
... and 6 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lesnik512 lesnik512 merged commit 7c7e67d into main Jun 5, 2026
8 checks passed
@lesnik512 lesnik512 deleted the fix/bug-audit-v2-pr1-lifecycle branch June 5, 2026 15:40
lesnik512 added a commit that referenced this pull request Jun 5, 2026
Three PRs (#108, #109, #110), 26 findings closed, 153 → 187 tests, 100% coverage
throughout. Captures what went well (per-PR sequencing, two-stage review catching
three real bugs), what didn't (three implementer-hallucination incidents,
cross-task __post_init__ cascade missed in PR2 plan, OTel API misread in audit,
SEC-2 host parser untested against canonical localhost:4317 form), and six
action items to harden the pipeline for the next audit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lesnik512 added a commit that referenced this pull request Jun 5, 2026
Bug-audit-v2 cycle: 26 findings across 4 PRs (#108-#111). Lifecycle hardening,
config validation, CI gate, generalized TeardownError aggregation. Two
behavior changes called out: FastAPIConfig no longer stomps user app fields;
CorsConfig wildcard+credentials now raises.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant